Use pin_memory in forward_batch.init_new to reduce decoding latency by litmei · Pull Request #21360 · sgl-project/sglang

litmei · 2026-03-25T04:06:15Z

Motivation

In low-latency scenarios, there are substantial idle gaps between Decode phases.

Modifications

Profiling analysis identified a Host-to-Device (H2D) synchronization bottleneck within forward_batch.init_new. Applying .pin_memory() converts this synchronous operation into an asynchronous one.

Accuracy Tests

Benchmarking and Profiling

Before:

After:

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.

Review Process

Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
Get approvals from CODEOWNERS and other reviewers.
Trigger CI tests with comments or contact authorized users to do so.
- /tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci
After green CI and required approvals, ask Merge Oncalls to merge.

gemini-code-assist · 2026-03-25T04:06:18Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

sglang-npu-bot · 2026-03-30T06:30:34Z

/tag-and-rerun-ci

sglang-npu-bot · 2026-04-02T06:08:17Z

/rerun-failed-ci

shadowxz109 · 2026-04-06T09:28:13Z

/rerun-failed-ci

shadowxz109 · 2026-04-06T09:54:39Z

/rerun-failed-ci

litmei · 2026-04-07T16:42:46Z

+        if _pin:
+            mrope_positions_cat = torch.cat(
+                [pos for pos in mrope_positions_list],
+                dim=1,
+            ).pin_memory()
+        else:
+            mrope_positions_cat = torch.cat(
+                [pos for pos in mrope_positions_list],
+                dim=1,
+            )
+        self.mrope_positions = mrope_positions_cat.to(
+            dtype=torch.int64, device=model_runner.device
+        )


could this part be simpler?

litmei added 2 commits March 25, 2026 11:46

forward batch init new use pin_memory

8088636

comment

6ac4e68

litmei requested review from Fridge003, Ying1123, hnyls2002, ispobock and merrymercy as code owners March 25, 2026 04:06

Merge branch 'main' into decode_low_latency

23ce5bc

github-actions bot added the run-ci label Mar 30, 2026

sglang-npu-bot and others added 4 commits March 30, 2026 14:48

Merge branch 'main' into decode_low_latency

afa7011

Merge branch 'main' into decode_low_latency

10141d9

is_npu branch

e804e48

is_npu branch

3a6d934

litmei and others added 2 commits April 6, 2026 20:12

Merge branch 'main' into decode_low_latency

62f43c2

update

6cd65a3

litmei commented Apr 7, 2026

View reviewed changes

Comment thread python/sglang/srt/model_executor/forward_batch_info.py

fix xeon platform

4d0c91f

litmei commented Apr 7, 2026

View reviewed changes

litmei commented Apr 8, 2026

View reviewed changes

Comment thread python/sglang/srt/model_executor/forward_batch_info.py

update is_pin_memory_available

9feb0a4

litmei mentioned this pull request Apr 8, 2026

[Feature] Add zero bubble for spec v2 #21895

Open

5 tasks

litmei and others added 2 commits April 9, 2026 10:07

Merge branch 'main' into decode_low_latency

9f364ff

fix pin_memory not work

11e3b33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use pin_memory in forward_batch.init_new to reduce decoding latency#21360

Use pin_memory in forward_batch.init_new to reduce decoding latency#21360
litmei wants to merge 13 commits intosgl-project:mainfrom
litmei:decode_low_latency

litmei commented Mar 25, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Mar 25, 2026

Uh oh!

sglang-npu-bot commented Mar 30, 2026

Uh oh!

sglang-npu-bot commented Apr 2, 2026

Uh oh!

shadowxz109 commented Apr 6, 2026

Uh oh!

shadowxz109 commented Apr 6, 2026

Uh oh!

Uh oh!

litmei Apr 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

litmei commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Review Process

Uh oh!

gemini-code-assist bot commented Mar 25, 2026

Uh oh!

sglang-npu-bot commented Mar 30, 2026

Uh oh!

sglang-npu-bot commented Apr 2, 2026

Uh oh!

shadowxz109 commented Apr 6, 2026

Uh oh!

shadowxz109 commented Apr 6, 2026

Uh oh!

Uh oh!

litmei Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

litmei commented Mar 25, 2026 •

edited

Loading